Improving Word Alignment Quality using Morpho-syntactic Information

نویسندگان

  • Hermann Ney
  • Maja Popovic
چکیده

In this paper, we present an approach to include morpho-syntactic dependencies into the training of the statistical alignment models. Existing statistical translation systems usually treat different derivations of the same base form as they were independent of each other. We propose a method which explicitly takes into account such interdependencies during the EM training of the statistical alignment models. The evaluation is done by comparing the obtained Viterbi alignments with a manually annotated reference alignment. The improvements of the alignment quality compared to the, to our knowledge, best system are reported on the German-English Verbmobil corpus.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unsupervised Bilingual Morpheme Segmentation and Alignment with Context-rich Hidden Semi-Markov Models

This paper describes an unsupervised dynamic graphical model for morphological segmentation and bilingual morpheme alignment for statistical machine translation. The model extends Hidden Semi-Markov chain models by using factored output nodes and special structures for its conditional probability distributions. It relies on morpho-syntactic and lexical source-side information (part-of-speech, m...

متن کامل

SWIFT Aligner, A Multifunctional Tool for Parallel Corpora: Visualization, Word Alignment, and (Morpho)-Syntactic Cross-Language Transfer

It is well known that word aligned parallel corpora are valuable linguistic resources. Since many factors affect automatic alignment quality, manual post-editing may be required in some applications. While there are several state-of-the-art word-aligners, such as GIZA++ and Berkeley, there is no simple visual tool that would enable correcting and editing aligned corpora of different formats. We...

متن کامل

Boosting Statistical Machine Translation by Lemmatization and Linear Interpolation

Data sparseness is one of the factors that degrade statistical machine translation (SMT). Existing work has shown that using morphosyntactic information is an effective solution to data sparseness. However, fewer efforts have been made for Chinese-to-English SMT with using English morpho-syntactic analysis. We found that while English is a language with less inflection, using English lemmas in ...

متن کامل

Improving Dependency Parsing with Interlinear Glossed Text and Syntactic Projection

Producing annotated corpora for resource-poor languages can be prohibitively expensive, while obtaining parallel, unannotated corpora may be more easily achieved. We propose a method of augmenting a discriminative dependency parser using syntactic projection information. This modification will allow the parser to take advantage of unannotated parallel corpora where high-quality automatic annota...

متن کامل

Improving Function Word Alignment with Frequency and Syntactic Information

In statistical word alignment for machine translation, function words usually cause poor aligning performance because they do not have clear correspondence between different languages. This paper proposes a novel approach to improve word alignment by pruning alignments of function words from an existing alignment model with high precision and recall. Based on monolingual and bilingual frequency...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004